Interpreting word recognition decisions with a document database graph

نویسندگان

  • Jonathan J. Hull
  • Yanhong Li
چکیده

A method is presented to filter the output of a word recognition algorithm, which may contain errors, to locate decisions that should be correct with a high degree of certainty. The algorithm uses the output of a word recognition system and techniques used in information retrieval to characterize a free-text document database to locate a set of documents that have topics which are similar to that of the input document. The vocabulary from these similar documents is then used to locate the correct word recognition decisions. Experimental results show that a subset of the word recognition decisions for an input document can be located that are between 90 and 99 percent correct. The subset located by this method can be used to drive other recognition processes applied to the rest of the text.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model

In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...

متن کامل

Holistic Farsi handwritten word recognition using gradient features

In this paper we address the issue of recognizing Farsi handwritten words. Two types of gradient features are extracted from a sliding vertical stripe which sweeps across a word image. These are directional and intensity gradient features. The feature vector extracted from each stripe is then coded using the Self Organizing Map (SOM). In this method each word is modeled using the discrete Hidde...

متن کامل

Word Recognition Result Interpretation Using the Vector Space Model for Information Retrieval

A method is presented to filter the output of a word recognition algorithm, which may contain errors, to locate decisions that should be correct with a high degree of certainty. The algorithm uses the output of a word recognition system and a vector space model for information retrieval to locate a set of documents that have topics which are similar to that of the input document. The vocabulary...

متن کامل

Developing a Standardized Medical Speech Recognition Database for Reconstructive Hand Surgery

Fast and holistic access to the patients’ clinical record is a major requirement of modern medical decision support systems (DSS). While electronic health records (EHRs) have replaced the traditional paper-based records in most healthcare organization, the data entry into these systems remains largely manual. Speech recognition technology promises substitution of the more convenient speech-base...

متن کامل

A New Document Embedding Method for News Classification

Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1993